NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MDRepo—an open data warehouse for community-contributed molecular dynamics simulations of proteins

https://doi.org/10.1093/nar/gkae1109

Roy, Amitava; Ward, Ethan; Choi, Illyoung; Cosi, Michele; Edgin, Tony; Hughes, Travis_S; Islam, Md_Shafayet; Khan, Asif_M; Kolekar, Aakash; Rayl, Mariah; et al (November 2024, Nucleic Acids Research)

Abstract Molecular Dynamics (MD) simulation of biomolecules provides important insights into conformational changes and dynamic behavior, revealing critical information about folding and interactions with other molecules. The collection of simulations stored in computers across the world holds immense potential to serve as training data for future Machine Learning models that will transform the prediction of structure, dynamics, drug interactions, and more. Ideally, there should exist an open access repository that enables scientists to submit and store their MD simulations of proteins and protein-drug interactions, and to find, retrieve, analyze, and visualize simulations produced by others. However, despite the ubiquity of MD simulation in structural biology, no such repository exists; as a result, simulations are instead stored in scattered locations without uniform metadata or access protocols. Here, we introduce MDRepo, a robust infrastructure that provides a relatively simple process for standardized community contribution of simulations, activates common downstream analyses on stored data, and enables search, retrieval, and visualization of contributed data. MDRepo is built on top of the open-source CyVerse research cyber-infrastructure, and is capable of storing petabytes of simulations, while providing high bandwidth upload and download capabilities and laying a foundation for cloud-based access to its stored data.
more » « less
MDRepo – an open environment for data warehousing and knowledge discovery from molecular dynamics simulations

https://doi.org/10.1101/2024.07.11.602903

Roy, Amitava; Ward, Ethan; Choi, Illyoung; Cosi, Michele; Edgin, Tony; Hughes, Travis S; Islam, Md Shafayet; Khan, Asif M; Kolekar, Aakash; Rayl, Mariah; et al (July 2024, bioRxiv)

BackgroundMolecular Dynamics (MD) simulation of biomolecules provides important insights into conformational changes and dynamic behavior, revealing critical information about folding and interactions with other molecules. This enables advances in drug discovery and the design of therapeutic interventions. The collection of simulations stored in computers across the world holds immense potential to serve as training data for future Machine Learning models that will transform the prediction of structure, dynamics, drug interactions, and more. A needIdeally, there should exist an open access repository that enables scientists to submit and store their MD simulations of proteins and protein-drug interactions, and to find, retrieve, analyze, and visualize simulations produced by others. However, despite the ubiquity of MD simulation in structural biology, no such repository exists; as a result, simulations are instead stored in scattered locations without uniform metadata or access protocols. A solutionHere, we introduce MDRepo, a robust infrastructure that supports a relatively simple process for standardized community contribution of simulations, activates common downstream analyses on stored data, and enables search, retrieval, and visualization of contributed data. MDRepo is built on top of the open-source CyVerse research cyberinfrastructure, and is capable of storing petabytes of simulations, while providing high bandwidth upload and download capabilities and laying a foundation for cloud-based access to its stored data.
more » « less
Full Text Available
Cloud Computing for Research and Education Gets a Sweet Upgrade with CACAO

https://doi.org/10.1145/3569951.3597555

Skidmore, Edwin; Cosi, Michele; Swetnam, Tyson; Merchant, Nirav; Xu, Zhouyun; Choi, Illyoung; Davey, Sean; Frady, Jeremy; Wall, Mariah; Yung, Michelle (July 2023, ACM)
Libra: scalable k-mer based tool for massive all-vs-all metagenome comparisons

https://doi.org/10.1093/gigascience/giy165

Choi, Illyoung; Ponsero, Alise J; Bomhoff, Matthew; Youens-Clark, Ken; Hartman, John H; Hurwitz, Bonnie L (December 2018, GigaScience)

Full Text Available
iMicrobe: Tools and data-driven discovery platform for the microbiome sciences

https://doi.org/10.1093/gigascience/giz083

Youens-Clark, Ken; Bomhoff, Matt; Ponsero, Alise_J; Wood-Charlson, Elisha_M; Lynch, Joshua; Choi, Illyoung; Hartman, John_H; Hurwitz, Bonnie_L (July 2019, GigaScience)

Abstract BackgroundScientists have amassed a wealth of microbiome datasets, making it possible to study microbes in biotic and abiotic systems on a population or planetary scale; however, this potential has not been fully realized given that the tools, datasets, and computation are available in diverse repositories and locations. To address this challenge, we developed iMicrobe.us, a community-driven microbiome data marketplace and tool exchange for users to integrate their own data and tools with those from the broader community. FindingsThe iMicrobe platform brings together analysis tools and microbiome datasets by leveraging National Science Foundation–supported cyberinfrastructure and computing resources from CyVerse, Agave, and XSEDE. The primary purpose of iMicrobe is to provide users with a freely available, web-based platform to (1) maintain and share project data, metadata, and analysis products, (2) search for related public datasets, and (3) use and publish bioinformatics tools that run on highly scalable computing resources. Analysis tools are implemented in containers that encapsulate complex software dependencies and run on freely available XSEDE resources via the Agave API, which can retrieve datasets from the CyVerse Data Store or any web-accessible location (e.g., FTP, HTTP). ConclusionsiMicrobe promotes data integration, sharing, and community-driven tool development by making open source data and tools accessible to the research community in a web-based platform.
more » « less
Libra: Improved Partitioning Strategies for Massive Comparative Metagenomics Analysis

https://doi.org/10.1145/3217880.3217882

Choi, Illyoung; Ponsero, Alise J.; Youens-Clark, Ken; Bomhoff, Matthew; Hurwitz, Bonnie L.; Hartman, John H. (June 2018, Proceedings of the 9th Workshop on Scientific Cloud Computing)

Big-data analytics platforms, such as Hadoop, are appealing for scientific computation because they are ubiquitous, well-supported, and well-understood. Unfortunately, load-balancing is a common challenge of implementing large-scale scientific computing applications on these platforms. In this paper we present the design and implementation of Libra, a Hadoop-based tool for comparative metagenomics (comparing samples of genetic material collected from the environment). We describe the computation that Libra performs and how that computation is implemented using Hadoop tasks, including the techniques used by Libra to ensure that the task workloads are balanced despite nonuniform sample sizes and skewed distributions of genetic material in the samples. On a 10-machine Hadoop cluster Libra can analyze the entire Tara Ocean Viromes of ~4.2 billion reads in fewer than 20 hours.
more » « less
Full Text Available

Search for: All records